0.1 Descriptive statistics

The effect size estimates ranged from -3.80 to 14.07 (\(M = 0.20, SD = 1.72\)). Five effect sizes exceeded 3 standard deviations from the mean effect size; consider removing these as outliers. Several studies reported multiple effect sizes (1 - 30, with most reporting 1 effect size).

0.2 Meta-analysis

Meta-analysis was conducted in R (R Core Team 2021) using the R-packages metafor (Viechtbauer et al. 2010), and pema (Van Erp S. 2021). To estimate overall effects, we used three-level meta-analysis to account for dependent effect sizes within studies (Van den Noortgate et al. 2015). Let \(y_{jk}\) denote the \(j\) observed effect sizes \(y\), originating from \(k\) studies. The multi-level model is then given by the following equations:

\[ \left. \begin{aligned} y_{jk} &= \beta_{jk} + \epsilon_{jk} &\text{where } \epsilon_{jk} &\sim N(0, \sigma^2_{\epsilon_{jk}})\\ \beta_{jk} &= \theta_k + w_{jk} &\text{where } w_{jk} &\sim N(0, \sigma^2_{w})\\ \theta_{k} &= \delta + b_{k} &\text{where } b_k &\sim N(0, \sigma^2_{b}) \end{aligned} \right\} \]

The first equation indicates that observed effect sizes are equal to the underlying population effect size, plus sampling error \(\epsilon_{jk}\). The second equation indicates that population effect sizes within studies are a function of a study-specific true effect size, plus within-study residuals \(w_{jk}\). The third equation indicates that the distribution of study-specific true effect sizes are distributed around an overall mean effect, with between-study residuals \(b_k\).

Separate meta-analyses were conducted for each of the samples. The overall pooled effect sizes were:

The overall effect size estimate differed significantly from zero for CBDhumanexperimental, CBDanimalconditioned, AM404conditioned, URB597conditioned.

The within-studies variance component \(\sigma^2_w\) (between effect sizes) was significant for CBDhumanexperimental, CBDanimalunconditioned, AM404unconditioned, URB597conditioned, URB597unconditioned.

The between-studies variance \(\sigma^2_b\) was significant for CBDanimalunconditioned, AM404unconditioned, URB597unconditioned, PF3485unconditioned.

This indicates that there was substantial heterogeneity between average effect sizes, both within studies and across studies, in most of the samples.

0.3 Forest plots

The forest plots for the aforementioned three-level meta-analyses are presented below. Within each plot, studies are ranked by their sampling variance \(vi\); thus, the most precise estimates are at the bottom, near the overall effect.

Forest plot for CBDhumanexperimental

Figure 0.1: Forest plot for CBDhumanexperimental

Forest plot for CBDanimalconditioned

Figure 0.2: Forest plot for CBDanimalconditioned

Forest plot for CBDanimalunconditioned

Figure 0.3: Forest plot for CBDanimalunconditioned

Forest plot for AM404conditioned

Figure 0.4: Forest plot for AM404conditioned

Forest plot for AM404unconditioned

Figure 0.5: Forest plot for AM404unconditioned

Forest plot for URB597conditioned

Figure 0.6: Forest plot for URB597conditioned

Forest plot for URB597unconditioned

Figure 0.7: Forest plot for URB597unconditioned

Forest plot for PF3485unconditioned

Figure 0.8: Forest plot for PF3485unconditioned

0.4 Moderator analyses

The effect of multiple moderators was investigated using meta-regression. For two continuous variables, dose and HED, a quadratic term was computed to examine the non-linear (U-shaped) effect. For categorical variables, dummies were encoded. Note that the resulting moderator matrix had 25 columns. As this exceeded in many cases the number of available effect sizes (per sample), these models were not identified. Addressing this problem requires performing variable selection. Three steps were taken to do so. First, variables and categories that did not occur within one subset of the data were omitted. Secondly, some dummy variables were redundant because some studies had identical values on multiple dummy variables. Only one of these redundant dummy variables was retained, and its name was updated to reflect all redundant dummies it represents. For example, all of the studies in the category “Both” of the variable “sex” used the “public speaking test,” and no other sex category used this test. These two variables are therefore identical, and their effects cannot be distinguished. Thus, the analysis shows their joint effect as an effect of sexBoth.anxiety_testspeaking. Thirdly, despite these measures, many meta-regression models dropped all or some of the predictors, or failed to converge entirely, suggesting the models were empirically non-identified. Although these models are reported below, we advise against their substantive interpretation.

The problems with meta-regression suggests that a technique is required that performs variable selection during analysis. Such a technique was recently developed: Bayesian penalized meta-regression (BRMA), as implemented in the pema R-package (Van Erp S. (2021)). By imposing a regularizing (horseshoe) prior on the regression coefficients, BRMA shrinks all coefficients towards zero, which aids empirical model identification. Coefficients must overwhelm the prior in order to become significantly different from zero. Thus, this method also performs variable selection: identifying which moderators are important in predicting the effect size. The resulting regression coefficients are negatively biased by design, but the estimate of residual heterogeneity \(\tau^2\) is unbiased. Note that, as this is a Bayesian model, inference is based on credible intervals. A credible interval is interpreted as follows: The population value falls within this interval with 95% probability (certainty). This is different from the interpretation of frequentist confidence intervals, which are interpreted as follows: In the long run, 95% of confidence intervals contain the population value.

To examine the effect of a categorical variable, a reference category must be chosen. Dummy variables encode the difference between each remaining category and this reference category. When examining the results, the intercept represents the expected effect size for a study that falls within the reference category for all categorical variables. The effect of dummy variables represents the difference of that category with the reference category. If a dummy variable has a significant effect, that means that that group’s mean differs significantly from the reference category’s mean (i.e., from the intercept).

Note that in penalized regression, predictors are usually standardized. However, the effect of standardized dummies cannot be meaningfully interpreted. Therefore, only continuous predictors were standardized in this analysis. This may give dummy variables a slight advantage, leading them to become significant sooner than continuous ones.

0.4.1 Classic meta-regression

Note that analyses containing VIF values greater than 5 should be regarded as problematic, due to multicolinearity. This applies to nearly all models.

Model for CBDhumanexperimental did not converge.

Model for CBDanimalconditioned did not converge.

Model for AM404conditioned did not converge.

Model for URB597conditioned did not converge.

0.4.2 Bayesian regularized meta-regression:

0.5 Subgroup plots

1 Conclusion

Based on three-level multilevel meta-analyses, there is limited evidence that overall effects are non-zero in the population, except for the samples “Acq retr to ctx” and “Ext retr to ctx.” All samples showed significant between-studies variance, except “Ext retr to cue.” Conversely, none of the samples showed significant within-studies variance, except “Acq retr to ctx.” There is thus substantial evidence that heterogeneity in effect sizes is mostly due to between-studies differences.

Classic meta-regression analyses were largely invalid for moderator analysis, because of high multicolinearity among predictors. BRMA analyses were used, which are robust to multicolinearity, and perform variable selection by shrinking regression coefficients towards zero. These BRMA analyses revealed no consistent evidence of any significant moderator effect across samples.

R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Van den Noortgate, Wim, José Antonio López-López, Fulgencio Marín-Martínez, and Julio Sánchez-Meca. 2015. “Meta-Analysis of Multiple Outcomes: A Multilevel Approach.” Behavior Research Methods 47 (4): 1274–94. https://doi.org/10.3758/s13428-014-0527-2.
Van Erp S., Van Lissa C. J. &. 2021. “Select Relevant Moderators Using Bayesian Regularized Meta-Regression.” PsyArxiv. https://doi.org/10.31234/osf.io/6phs5.
Viechtbauer, Wolfgang et al. 2010. “Conducting Meta-Analyses in R with the Metafor Package.” J Stat Softw 36 (3): 1–48.